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Abstract 

Methods for the reduction of the complexity of computational prob¬ 
lems are presented, as well as their connections to renormalization, scal¬ 
ing, and irreversible statistical mechanics. Several statistically stationary 
cases are analyzed; for time dependent problem averaging usually fails, 
and averaged equations must be augmented by appropriate memory and 
random forcing terms. Approximations are described and examples are 
given. 


1 Introduction 

There are many problems in science which are too complex for numerical so¬ 
lution as they stand. Examples include turbulence and other problems where 
multiple scales must be taken into account. Such problems must be reduced 
to more amenable forms before one computes. In the present paper we would 
like to summarize some reduction methods that have been developed in recent 
years, together with an account of what was learned in the process. It is obvious 
that the problem has not been fully solved, but we think that the examples and 
the conclusions reached so far are useful. 

In general terms, a reduction to a more amenable form is a renormalization 
group transformation, as in physics — a transformation of a problem into a 
more tractable form while keeping quantities of interest invariant. A renormal¬ 
ization group transformation involves an incomplete similarity transformation 
(see below for definitions), and thus a reduction method is a search for hidden 
similarities. This a general feature of reduction methods, and it will be illus¬ 
trated in the examples. A successful problem reduction produces a new problem 
which must in some asymptotic sense be similar to the original problem. For 
general backgound on renormalization, see e.g. iniEiiEi 

In problems with strong time dependence, reduction methods resemble meth¬ 
ods for the analysis of thermodynamic systems not in equilibrium; indeed, those 
aspects of the problem that are ignored in a reduced description conspire to de¬ 
stroy order and increase entropy. Problem reduction for time-dependent prob¬ 
lems is basically renormalization group theory for non-equilibrium statistical 
mechanics. For background on such theory, see e.g. ElESlEI 

The content of the paper is as follows: In section|21we consider Hamiltonian 
systems and their conditional expectations. In section |21 we narrow the discus¬ 
sion to statistically stationary Hamiltonian systems and recover Kadanoff real- 
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space renormalization groups and an interesting block Monte-Carlo method. In 
section^Jwe display an example that exhibits and also extends the main features 
of this analysis in simple form. 

In section we explain the Mori-Zwanzig formalism for the reduction of 
statistically time-dependent problems. The analysis shows that averaging the 
equations is in general not enough; one must take into account noise and a tem¬ 
poral memory. The Mori-Zwazig formalism is rather dense, and in the sections 
that follow we present various special cases in which it can be simplified, in 
particular when the memory is very short or very long. 

For the sake of readability, we remind the reader of the rudiments of simi¬ 
larity theory |3]. Suppose a variable a is a function of variables ai, 02 ,..., am, 
5i, 62 ,..., where ai,.--,am have independent units, for example units of 
length and mass, while the units of bi,. ■ ■ ,bk, can be formed from the units 
of oi, 02 ,..., Om. Then there exist dimensionless variables 11 = ’ 

Hi = , * = 1 ,..., A:, where the a^, aij are simple fractions, such that 

n is a function of the 11 ^: 


n = $(ni,...,nfe). (i) 

This is just a consequence of the requirement that a physical relationship be 
independent of the size of the units of measurement. At this stage nothing can 
be said about the function d). Now suppose the variables 11^ are small or large, 
and assume that the function d> has a non-zero hnite limit as its arguments 
tend to zero or to infinity; then 11 ~ constant, and one hnds a power monomial 
relation between a and the a^. This is a complete similarity relation. If the 
function $ does not have the assumed limit, it may happen that for Hi small 
or large, d)(ni) = n“$i(ni) -I- ..., where the dots denote lower order terms, a 
is a constant, the other arguments of $ have been omitted and d*! has a finite 
non-zero limit. One can then obtain a scaling (power monomial) expression for 
a in terms of the Ui and bi, with undetermined powers which must be found 
by means other than dimensional analysis. The resulting power relation is an 
incomplete similarity relation. Of course one may well have functions d> with 
neither kind of similarity. 

Incomplete similarity expresses what is invariant under a renormalization 
group; all renormalization group transformations involve incomplete similarity, 
see the books already cited as well as written before the notion of incomplete 
similarity was formalized. The exponent a is called an anomalous exponent. 

The paper \2‘2\ is a survey of reduction methods organized along different 
lines and can be profitably read in tandem with the present paper. 


2 Averaging a Hamiltonian system 

We begin by examining what happens when one tries to reduce the complexity 
of a Hamiltonian system by averaging (see also Consider a system 
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of nonlinear ordinary differential equations, 

<^(0) = X, (2) 

where ip and x are n-dimensional vectors with components ipi and Xi, and R is 
a vector-valued function with components Rp, t is time. To each initial value x 
in m corresponds a trajectory (p{t) = ip(x,t). 

Suppose that we only want to find m of the n components of the solution 
vector ip(t) without hnding the n — m others. One has to assume something 
about the variables that are not evaluated, and we assume that at time t=0 we 
have a a joint probability density F(x) for all the variables. The variables we 
keep will have definite initial values xi, X2, ■ ■ ■, Xm-, and the rest of variables will 
then have a conditional probability density fm = , Xm, Xm+i, ■ ■ ■ )IZm, 

where Zm = f{xi,. ■ ■, Xm, Xm+i, ■ ■ ■ )dxm+idxm.+2 ■ ■■ is a normalization 
constant. Without some assumption about the missing variables the problem is 
meaningless; this particular assumption is reasonable because in practice / can 
often be estimated from previous experience or from general considerations of 
statistical mechanics. The question is how to use this prior knowledge in the 
evaluation of ip(t). 

Partition the vector x so that x = (xi, X2, ■ ■ ■, Xm), x = (xm+i, ■ ■ ■, Xn) 
and X = (x,x), and similarly ip = {(p,ip)^R = {R,R). In general the first m 
components of R depend on all the components oi ip, R = R(ip) = R{ip,ip); 
if they do not we have a system of m equations in m variables and nothing 
further needs to be done. We want to calculate only the variables ip; then 
{d/dt)ip{t) = R{<p{t)) where the right hand side depends on the variables ip which 
are unknown at time t. We shall call the variables ip the “resolved variables” 
and the remaining variables ip the “unresolved variables”. 

Consider in particular a Hamiltonian system as in mm There exists 
then a Hamiltonian function H = H{ip) such that for i odd Ri, the Tth com¬ 
ponent of the vector i? in (| 21 ) satishes Ri = dH jdipij^x while for i even one has 
Ri = —dHIdpi-i, with n, the size of the system, even. Assume furthermore 
that /, the initial probability density, is f{ip) = Z~^ exp{—H/T) where T is 
a parameter, known in physics as the “temperature”, which will be set equal 
to one in much, but not all, of the discussion below. In physics this density 
appears naturally and is known as the “canonical” density; the normalizing 
constant Z = Z{T) is the “partition function”. This density / is invariant, i.e. 
sampling it and evolving the system in time commute. 

A numerical analyst who wants to approximate the solution of an equation 
usually starts by approximating the equation. If one solves for the resolved 
variables one has values for the variables ip available at each instant t and 
the best approximation should be a function of these variables; it is natural 
to seek a best approximation in the mean square sense with respect to the 
invariant density / at each time; the best approximation in this sense is the 
conditional expectation E[R{ip)\(p\ = J dipjf e~^dip (note that we set T = 
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1). This conditional expectation is the orthogonal projection of R onto the 
space of functions of (p with respect to the inner product {u,v) = E[uv\ = 
f u((p)v((p)f((p)d(p, where d(p denotes integration over all the components of (p. 
We then try to approximate the system © by: 

ip(0) = X. (3) 


We have shown in ininiEig that: (i) The new system is also Hamilto¬ 
nian: 

= J exp(-iJ)d(^/ J exp{-H)d<p = (4) 

where i < m = the dimension of tp, and 


E 


dH 

dipi 


\^it) 


H = —log 


exp{—E[)dip 


(5) 


is the new Hamiltonian. 

(ii) The new canonical density / = Z~^ exp(—i?) is invariant in the evolution 
of the new, reduced, system. 

(hi) When the data are sampled from the canonical distribution, the distri¬ 
bution of ip in the new system is its marginal distribution in the old system; 
equivalently, the partition function Z is the same for the old system and for the 
new system. 

Now the question is, what does the solution p{t) of @ represent ? It does not 
approximate the first m components of the solution p{t) of 121)- the components 
of (p and the components of p live in spaces of different dimension and in general 
the components of the latter in those higher n—m dimensions are not small. One 
could hope that what the solution of © approximates is the vector E\p{t)\x\, the 
best estimate of the first components of the solution at time t given the partial 
initial information x. This is the case for linear systems (where averaging and 
time integration commute), and is approximately the case for limited time in 
some other special situations- nearly linear systems, some systems where the 
“unresolved variables” are fast. However, in general this is not the case. We 
shall see below that a reduced description of the solution of nonlinear systems 
in time requires in general “noise” and a “memory”. 

The lack of convergence can be understood by the following physics argu¬ 
ment. In physics a system in which the values of all the variables are drawn 
from a canonical distribution is a system in thermal equilibrium. The assign¬ 
ment of definite values x to the variables p at time t = 0 amounts to taking the 
system out of equilibrium at t = 0; if the system is ergodic it will then decay 
to equilibrium in time, so that all the variables become randomized and acquire 
the joint density /. Thus the predictive value of the partial initial data x de¬ 
creases in time; all averages of the p approach equilibrium averages. However, 
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Figure 1: Comparison of the evolution of E[(j)i{t)\cj)i{0), (j)2{0)] (truth), to the 
prediction by the ” Galerkin” approximation and the prediction by the averaging 
procedure described in the text. 

the reduced system @ is Hamiltonian, and the solutions it produces oscillate 
forever. 

In Figure 1 we consider the Hald Hamiltonian system (ca) with 



( 6 ) 


(physically, two linear oscillators with a nonlinear coupling). We assume that 
y)i(O), VJ2(0) are given and sample the two other initial data from the canonical 
distribution with T = 1. 

In Figure 1 are displayed (1) The result for tpi of a “Galerkin” calculation in 
which the unresolved variables are set to zero (this is what is implicitly done in 
many unresolved computations); (2) the result of the averaging procedure just 
described, and (3) the true E[(fi{t)\x], calculated by repeatedly sampling the 
initial data, solving the full system, and averaging. As one can see, averaging is 
initially better than the null “Galerkin” method, but in the long run the truth 
decays but the solution of the averaged system oscillates for ever. For more 
detail, see m 

The procedure we have just described resembles sufficiently the averaging 
methods used in some areas of engineering, for example the large-eddy simula¬ 
tion methods in turbulence (see e.g. m) and in some multiscale problems (see 
e.g. gH), to cast a very serious doubt on the broad validity of the latter. For 
a description of special cases, with small fluctuations and particular structures. 
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where this procedure is legitimate, see |^. 


3 Prediction with no data and block Monte-Carlo 


There is however a case where the construction of the preceding section can 
be very useful- when m = 0, i.e., when one tries to predict the future with no 
initial information. Equations then sample the canonical distribution and 
the reduced system samples a subset of variables without sampling the others, 
and, as we have seen, keep the statistics of the resolved variables unchanged 
(see m for an application to molecular dynamics). 

To see what is happening, suppose the variables Lpi are associated with nodes 
on a regular lattice, for example, they may represent spins in a solid, or originate 
in the spatial discretization of a partial differential equation. 

Divide the lattice into blocks of some fixed shape (for example, divide a 
regular one-dimensional lattice into groups of two contiguous nodes). We had 
not yet specified how the variables are to be divided into resolved and unre¬ 
solved. Now decide to “resolve” one variable per block, and leave the others 
in the same block unresolved. The transformation between the old variables 
and the smaller set of resolved variables is a Kadanoff renormalization group 
transformation m-, the Hamiltonian H defined above in equation 0 is the 
renormalized Hamiltonian. We will now explain what this means. 

Suppose the system described by the Hamiltonian is translation invariant. 
The equations of motion for any at any one point, say at the location labeled 
by 1, have the same form as the equations of motion at any any other point. 
The relation between the right hand side of the reduced system and the right 
hand side of the old system can be rewritten as: 


dH 

difi 



(7) 


where the expected value is with respect to the invariant density as before. This 
relation is the starting point for the actual evaluation of H. 

Hamiltonians are functions of the variables tp. They can be expanded in the 
form: 

H = ( 8 ) 

3 

where the ipj are “elementary Hamiltonians”. In a translation invariant system, 
where each equation has the same form as any other, the Hamiltonian is made 
up of sums over i of terms of the form h(ipjipj) for various values of j, where h 
is some function; these terms represent “couplings” between variables j apart; 
one can then choose the elementary Hamiltonians to be polynomials in XiXi+j 
with a hxed j in each ipj, i.e., one segregates the couplings between variables j 
apart into separate terms. 

In a homogeneous system where there is only one variable per site it is 
enough to satisfy 0 for one variable, say for ipi. Define ip' = noting that 
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though 'll) is necessarily a function with at least as many arguments as there are 
components on ip, "0' can be sparse. Equation 0 reduces to 


dH 

dipi 


j 3 


(9) 


with the projection P defined as before by Pg{ip) = for any function 

g of ip. Now we’re almost done. One can pick a basis in L2, the subspace of 
square integrable functions that depend only on the variables (p, which consists 
of a subset of the set of functions if)'. The right-hand of equation m is then 
a linear combination of 'ip's' integration with respect to pi requires only the 
erasure of the primes and yields a series for H. The elements of ip are now gone, 
and one can relabel the remaining variables ip so that the terms in the series 
have exactly the same form as before; the calculation can then be repeated, 
yielding a sequence of Hamiltonians with ever fewer variables: H, = H, 
.... The corresponding densities /" = exp(—can 
in principle be sampled by any sampling scheme, for example by Metropolis 
sampling (but there are caveats, see e.g. cni). 

At this point we have reduced the number of variables by a factor L equal 
to the number of variables in each block, but this may well seem to be a pyrrhic 
victory. The Hamiltonians one usually encounters are simple, in the sense that 
they involve few couplings- finite differences typically link a few neighboring 
variables, and so do the usual spin Hamiltonians in physics. As one reduces the 
number of variables, the new Hamiltonians become more complex, with more 
terms in the series 0; the cost per time step of solving the equations in time 
or of the cost per move in a Metropolis sampling typically increases fast as well. 
To see what has been gained one must turn to the physics literature (see e.g. 

mM); 

Consider the spatial correlation length i which measures the range of val¬ 
ues of \j\ over which the spatial covariances E[piPij^j\ are non negligible, and 
the correlation time r for which the temporal covariances E[pi{t)p{t + s)] are 
non-negligible. For very large and very small values of the temperature T (the 
variance parameter in the density /) both the correlation time and the corre¬ 
lation length are small; the properties of the system can then be found from 
calculations with a small number of variables and it is not urgent to reduce the 
number of variables. There is a range of intermediate values of T for which the 
correlation length and time for are large and then the reduction is worthwhile. 
There often is a value Tc of T, the “critical value”, for which ^ = 00 . Values of 
T around Tc are often of great interest. 

Now we can see what the reduction can accomplish. If one tries to compute 
averages with T near one finds that the cost of computation is proportional 
to T- one has to compute long enough to obtain independent samples of p, 
and a new independent sample will not appear until a time ^ r has passed. 
The reductions above produce a system with smaller i and r and therefore 
computation takes less time. Though we started with the declared goal of 
reducing the number of variables, what has been produced is more interesting: 
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a new system with shorter correlations which is more amenable to computation. 
It is not the raw number of variables that matters. 

The renormalization can be used with a multigrid scheme, in which one runs 
up and down on different levels of renormalization, on the finer ones to achieve 
accuracy and the cruder ones to move fast from one macroscopic configuration 
to another. A comparison with other multigrid sampling schemes (see e.g. |S]) 
reveals that we have derived a reasonably standard scheme, with however a 
particularly effective way to store conditional expectations. For details see cni 

An alternative method for obtaining the expansion coefficients for the renor¬ 
malized Hamiltonians was proposed in m The method is based on the max¬ 
imization of the likelihood of the renormalized density. The maximization of 
the likelihood leads to a moment-matching problem. The moments in this case 
are the expectation values of the ’’elementary Hamiltonians” (see above) with 
respect to the renormalized density. The solution of the moment matching 
problem yields the expansion of the renormalized Hamiltonian. 

The recognition of the links of probability with renormalization is largely 
due to Jona-Lasinio (see e.g. |27|1. The connection of renormalization with in¬ 
complete similarity is too well known (see I21E1IE2I) to require further comment 
here. 


4 An example: The Korteveg-deVries-Burgers 
equation 

As an illustration of the ideas in the previous section, consider the equation 

Ut + MMx = eUa;x - PUxxx, (10) 

with boundary conditions 

u{ — Oo) = Uo, m(-|-Oo) = 0, Ux( — Oo) = 0, (11) 

where the subscripts denote differentiation, x is the spatial variable, t is time, 
e > 0 is a diffusion coefficient, /3 > 0 is a dispersion coefficient and Uq > 0 
is a given constant. The boundary conditions create a traveling wave solution 
moving to the right (towards -l-oo) with velocity Uq/2 which becomes steady in 
a moving framework as t —*■ oo. In nondimensional form the equation can be 
written as: ^ 

Ut T UUx — ~p:Uxx “f Uxxx^ (1^) 

JrC 

with Ux{—oo) = 0, u(-l-oo) = 0, u{—oo) = 1; R = eVU/a is a “Reynolds 
number”. For i? < 1 the traveling wave has a monotonic profile, while for 
i? > 1 the profile is oscillatory, with oscillations whose wave length is of order 
1 |Z]. At zero diffusion {R = oo) the stationary asymptotic wave train extends 
to infinity on the left. For finite R the wave train is damped and the solution 
tends to 1 as a: decreases. 
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The steady wave profile can be found by noting that it satisfies an ordinary 
differential equation, whose solution connects a spiral singularity at a; = oo to 
a saddle point at a; = +oo. At the steady state we average the solution at each 
point X over the region (a: — ^/2, a; + ^/2) and call the result u. Now look for 
an effective equation g{v,Vx,Vxx, ■. ■) = 0 whose solution v approximates u; v 
can be expected to be smoother than the solution of 03 and thus require fewer 
mesh points for an accurate numerical solution. 

We now make an analogy between the conditional expectations which define 
the renormalized variables in the previous sections and an averaging in space 
which dehnes “renormalized” variables for solutions of the KdVB equations that 
are stationary in a moving frame. Averaging over an increasing length scale 
corresponds either to more renormalization steps or, equivalently, to renormal¬ 
ization with a greater number of variables grouped together. We pick a class 
of equations in which to seek the “effective” equation, the one whose solutions 
best approximate the averages of the true solution in the mean square sense; 
the choice of mean-square approximation in the KdVB case corresponds to the 
use of L 2 norms implied by the use of conditional expectations in the previous 
sections, and the choice of a class of equations in which to look for the effec¬ 
tive equation is analogous to the choice of a basis for the representation of the 
Hamiltonian; the calculation of the best coefficients in the chosen class of “ef¬ 
fective” equations corresponds to the evaluation of the coefficients in the series 
for the renormalized Hamiltonians. In the Hamiltonian case we average the 
right-hand-sides of the equations and in the analogous KdVB case we attempt 
to average the solutions; this must be so because in the KdVB case we do not 
have theorems which guarantee that averaging the right-hand-sides produces 
the correct statistics for the solutions. 

We can look for an effective equation in the class of equations of the form 

C-^x d” '^'^x ~ ^eff'^xx d“ '^xxx d” '^xx d- . . . , (1^) 

where e>0,a>0,/3>0 are constants and c = 1/2 is the velocity of propaga¬ 
tion of the steady wave (see also ^). The problem is to find the value of the 
parameters in the effective equation which minimizes 

/ -l-oo 

\u{x) — v{x)\'^dx. (14) 

-OO 

One hnds numerically that that the last terms have little effect on the minimum 
if / when .^ > 5 (in the physics terminology, they are “irrelevant”). The effective 
equation is thus a Burgers equation with a value of the dimensionless diffusion 
coefficient Eeff different from 1/R. 

The minimization in da was carried out in El, and it showed that the 
mimimun was achieved when eg// = with the exponent v ^ 0.75. Note 

that when the diffusion coefficient e —*■ 0, then e^ff —> oo!. This is an incomplete 
similarity relation, as advertised, relating a “bare” Reynolds number i? to a 
“dressed” Reynolds The form of the effective equation could conceivably 
have been found by averaging the original equation, but the relation between 
the original e and Cg// requires some form of renormalization-like reasoning. 


9 


5 The Mori-Zwanzig formalism 

We now return to the problem we started investigating in Section |2 How to 
determine the evolution of a subset ip of components of a vector p described by 
a nonlinear set of equations of the form ©■ This is a nonlinear closure problem 
of a type much studied in physics, and a variety of formalisms is available 
for the job. We choose the Mori-Zwanzig formalism of irreversible statistical 
mechanics [iniEiiMiEniEni, because it homes in on the basic difficulty, which 
is the description of the memory in the system; the relation of this formalism 
to other nonlinear formalisms is described in m That a reduced description 
of a nonlinear system involves a memory should be intuitively obvious: suppose 
you have n > 3 billiard balls moving about on top of a table and are trying to 
describe the motion of just three; the second ball may strike the seventh ball 
at a time ti and the seventh ball may then strike the third ball at a later time. 

The third ball then “remembers” the state of the system at time , and if this 

memory is not encoded in the explicit knowledge of where the seventh ball is 
at all times, then it has to be encoded in some other way. We are no longer 
assuming that the system is Hamiltonian nor that we know an invariant density. 

It is much easier to work with linear equations, and we start by finding a 
linear equation equivalent to (not approximating!) the system (|21l. Introduce 
the linear Liouville operator L = X]r=i Liouville equation: 

d 

—u{x,t) = Lu{x,t) 

u{x,0) = g{x), (15) 

with initial data g{x). This is the partial differential equation for which 
is the set of characteristic equations. One can verify that the solution of the 
Liouville equation is u{x, t) = g{p{x, t)) (see e.g 23|)- In particular, if g{x) = Xi, 
the solution is u{x,t) = pi{x,t), the i-th component of the solution of (|21). This 
linear partial differential equation is thus equivalent to the nonlinear system ©• 
The linearity of equation 03 greatly facilitates the analysis. 

Introduce the semigroup notation u{x, t) = (e*^g)(x) = g{p{x, t)), where 
is the evolution operator associated with the operator L; therefore e*^g{x) = 
g{e*^x), and one can also verify that (this can be seen to be a 

change of variables formula). Equation HI 5|l becomes 

= Le^^g = e*^Lg. 

We suppose that as before we are given the initial values of the m coordinates x, 
and that the distribution of the remaining n — m coordinates x is the conditional 
density, / conditioned by x, where / is initially given. 

We define a projection operator P by Pg = i?[g|i]. The conditioning vari¬ 
ables are the initial values of p; in section[21the conditioning variables were the 
values of pit), which are unusable here when we do not know the probability 
density at time t. Quantities such as Pp{t) = E[p{t)\x\ are by definition the 
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best estimates of the future values of the variables ip given the partial data x 
and are often the quantities of greatest interest. 

Consider a resolved coordinate ipj{x,t) = {j < m), and split its time 

derivative, Rj{(p{x,t)) = e^^Lxj as follows: 

d 

—e*^Xj = e*^Lxj = e^^PLxj + e^^QLxj, (16) 


where Q = I — P. Define Rj{x) = {PRj){x); the hrst term is e^^PLxj = 
R{ip{x,t)) and is a function of the resolved components only (but it is a func¬ 
tion of the whole vector of initial data). Note that if Q were zero we would 
recover something that looks like the crude approximation of the previous sec¬ 
tion; however the conditioning variables are not the same. We shall see that the 
term in Q is essential. 

We further split the remaining term e^^QLxj. This splitting will bring it 
into a very useful form: a noise term, and a memory term whose kernel depends 
on the correlations of the noise term. The fact that such a splitting is possible 
is the essence of “fluctuation-dissipation” theorems (see e.g ED)- 

Let w{x,t) = e^^^QLxj, i.e., let w{x,t) be a solution of the initial value 
problem: 

d 

—w(x,t) = QLw{x,t) = Lw{x,t) — PLw{x,t) 

at 

u>(x,0) = QLxj. (17) 

If for some function h(x), Ph = 0, then Pe^^^h = 0 for all time t, i.e., 
maps the null space of P into itself. 

The evolution operators and satisfy the Duhamel relation 


gti = ^tQL 


,(t-s)Lp^^sQL 


Hence, 


e^^QLxj = e*^^QLxj + 



e<'*-^)^PLe^QLQLxj ds. 


Collecting terms, we find 


(18) 


^e*^Xj = e^^PLxj + 

dt 



e^^-^'>^PLe‘^QLQLxj ds + 


e^^^QLxj 


(19) 


The first term on the right hand side is the Markovian contribution to 
dt<Pj{x,t) —it depends only on the instantaneous value of the resolved (p(x,t). 
The second term depends on x through the values of (p{x^ s) at times s between 
0 and t, and embodies a memory—a dependence on the past values of the re¬ 
solved variables. Finally, the third term, which depends on full knowledge of 
the initial conditions a:, lies in the null space of P and can be viewed as noise 
with statistics determined by the initial conditions. 
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It is important to see that equation ca is an identity. The memory and noise 
terms have not been added artificially, their presence is a direct consequence 
of the original equations of motion. However tempting it may be to average 
equations by taking one-time averages, the results will in general be wrong; one 
must add a memory and a noise as well. 

If what is desired is the conditional expectation of (p{t) given x (the 

best approximation in the sense of L 2 to (p given the partial data i), then one 
can premultiply equation ca by P; the noise term then drops out and we find 

= Pe^^PLxj + P PLe^^^QLxj ds (20) 

Even if the system we start with is Hamiltonian, the Langevin equation (Unj is 
not; the memory and the noise allow the system to forget its initial values and 
decay to “thermal equilibrium” as it should (see section |2J). 

We now show that the memory term is a functional of the temporal correla¬ 
tions of the noise. To save on writing we restrict ourselves to cases where the op¬ 
erator L is skew-symmetric, i.e, {Lu,v) = —(u^Lv), (remember (u^v) = E[uv]). 
The skew-symmetry holds in particular for Hamiltonian systems with canoni¬ 
cal data, see US],Hi; however, here the the assumption is skew-symmetry is 
only an excuse to reduce the number of symbols, not a return to the Hamil¬ 
tonian case. Pick an orthonormal basis {hk = hk{x), fc = 1,... } in the range 
of P, which is the space of functions of x (for example, the could be Her- 
mite polynomials in the variables x). Any function tl}{x,t), can be expanded as 
Ip = '^f^i'ipix^t), hk)hk(x), and in particular, 

P{LQe^^^QLxj) = '^{LQe^^^QLxj, hk)hk{x). ( 21 ) 

k 

where a factor Q has been inserted before the exponentials, harmlessly because 
the operators that follow it all live in the null space of P. The memory term 
now becomes 

[ e^^-’^'^^PLe’^^^QLxjds = f Y^e^*-'^'>^{LQe^^^QLxj,hk)hk{x)ds 
Jo Jo f. 

= {LQe’'^^QLxj,hk)hk{<p{t - s))ds-, (22) 

k -^0 

In the last identity we used the fact that the parenthesis is independent of time 
and therefore commutes with the time evolution operator and also the fact 
that e^*~^'>^hk{x) = hk{f>{t — s)) by definition. Now {LQe‘^^^QLxj,hk{x)) = 
— {e'^^^QLxj, QLhk{x)) by the symmetry of Q and the assumed skew-symmetry 
of L; each term on the right hand side of equation ( 1221 ) is the ensemble average of 
the product of the value of the stochastic process e^^^QLxj at time s = t with 
the value of the stochastic process e*^^QLhk{x) evaluated at time s = 0, i.e., it 
is a temporal correlation. All these stochastic processes are in the range of Q for 
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all t, they are therefore components of the noise. Remember that by definition 
Lxj = Rj (a right-hand side in equations ©)• PLxj is then an average of the 
right-hand side of @ and QLxj = Rj — E\Rj\x\ is the initial fluctuation in that 
right-hand side. 

The first, “Markovian”, term in equations looks straightforward, but 
perils lurk there as well. In general Rj in equations @ is nonlinear, and so is 
PLxj = E\Rj\x]. e^^PLxj is a nonlinear function of the functions (p{t) which 
depends on all the components of a;, not only on x. Some way of approximating 
this function must be found. If one looks for conditional expectations, one must 
find a way to commute P with a nonlinear function; for a discussion, see cni 
This bullet was dodged in section|21when the conditioning variables were chosen 
to be (p{t) which change in time, but it may be hard to dodge here. 

The task now at hand is to extract something usable from these rather cum¬ 
bersome formulas. A very detailed presentation of the analysis in this section 
can be found in m 

6 Fluctuation-dissipation theorems 

We have established a relation between kernels in the memory term and the 
noise (the former is made up of covariances of the latter). This is the math¬ 
ematical content of what are known as “fluctuation-dissipation theorems” in 
physics. However, under some specific restricted circumstances, the relation 
between noise and memory takes on more intuitively appealing forms, which 
we now briefly describe. In physics one often takes a restricted basis in the 
range of P consisting of the coordinate functions xi, ...,Xm (the components of 
x). The resulting projection is called there the “ linear projection” as if P as 
defined above were not linear. The use of this projection is appropriate when 
the amplitude of the functions is small. One then has hi^{x) = for 
k < m. The correlations in equation 03 are then simply the temporal corre¬ 
lations of the noise (not of the full solutions of the system!). This is known as 
the fluctuation-dissipation theorem of the second kind. 

Specialize further to a situation where there is a single resolved variable, say 
4>i, so that TO = 1 and (j) has a single component. The Mori-Zwanzig equation 
becomes: 



or, 




■t 


{Lxi,xi)(j>i{x,t) + e*'^^QLxi — / {e’^^^QLxi,QLxi)(f>i{x,t — s)ds, (23) 
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where we have again inserted a harmless factor Q in front of assumed that 
L was skew-symmetric as above, and for the sake of simplicity also assumed 
= 1 (if the last statement is not true the formulas can be adjusted 
appropriately). Take the inner product of equation 12dl) with Xi, you find: 

d 

— {(t)i{x,t),xi) = {Lxi,xi){4>-i{x,t),xi) 
ot 

+ {e*^^QLxi,xi) — f {e'^'^^QLxi,QLxi)(j)i{x,t — s)ds 

Jo 

= {Lxi,xi){4>iix,t),xi) - / {e''^^QLxi,QLxi){(l)i{x,t - s),xi)ds, (24) 

Jo 

because Pe*^^QLxi = {e*^^QLxi,xi)xi = 0 and hence {e*^^QLxi,xi) = 0. 
Multiply equation (Ell by Xi, and remember that P(j)i{x,t) = {(j)i{x,t), xi)xi. 
You find: 

d f* 

—P(l)i{x,t) = {Lxi, xi)P(j)i{x,t) — {e^'^^QLxi,QLxi)P(j)i{x,t —s)ds. (25) 

ot Jq 

You observe that the covariance {(j)t{x,t),xi) and the projection of ipi on xi 
obey the same homogenous linear integral equation. This is the fluctuation- 
dissipation theorem of the first kind, which embodies the Onsager principle, 
according to which spontaneous fluctations in a system decay at the same rate 
as perturbations imposed by external means, when both are small (so that the 
linear projection is adequate). This reasoning can be extended to cases where 
there are multiple resolved variables, and this is usually done with the added 
simplifying assumption that (xi,Xj) = 0 when i ^ j. We omit the details. 


7 Very short and very long memory approxima¬ 
tions 

The approximation we shall examine is some detail is: 

gtQL ^ ( 26 ) 

and we will consider under what conditions it is reasonable. We will hnd that it 
is reasonable both when memory is very short and when it is very long. The fact 
that the same approximation works for two opposite cases is not a paradox. The 
approximation EEl states that the orthogonal dynamics operator is very close 
to the full dynamics operator. In other words, the orthogonal dynamics, which 
evolve in a space orthogonal to that of the resolved variables, are insensitive 
to the coupling between resolved and unresolved variables. This can happen in 
particular when the orthogonal dynamics are very fast or when the orthogonal 
dynamics are very slow. The ansatz above should work when there is an effective 
decoupling of the equations for the resolved and unresolved variables. This raises 
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the question of what determines the range of the memory. Is it possible to have 
a reduced model with very short or very long memory, depending on how one 
coarse-grains a particular system at hand? In m evidence was presented that, 
fo! r the Kuramoto-Sivashinsky equation, the range of the memory of a reduced 
model can vary dramatically, depending on whether all the unstable modes in 
the system are resolved or not. The construction of a reduced model corresponds 
to renormalization, and the two extreme cases can be interpreted as two fixed 
points of a renormalization scheme. In which one a reduced model will end up 
depends on how one renormalizes. Finally, note that the Duhamel formula can 
be used for an iterative solution of the orthogonal dynamics equation. The term 

is the zero-th order term of an iterative solution for . This construction 
can be based on the use of Feynman diagrams. 

First we examine the case when the memory is short, i.e., when the various 
terms in the series m vanish for s beyond a small value; see m for a different 
approach to short-memory reduced model construction and m for comparison 
with the present short-memory approximation, as well as m and the references 
therein. 

The memory term in the Mori-Zwanzig equations CH) can be rewritten as 


[ e^^-^'>^PLe‘^^^QLxjds= [ PLQe‘^^^QLxj ds, 

Jo Jo 


(27) 


where the insertion of the extra Q is harmless. Adding and subtracting equal 
quantities, we find: 


PLe^^^QLxj = PLQe^^QLxj + PLQ{e^^^ — e‘‘^)QLa 


(28) 


a Taylor series yields: 

g.QL _ = / -b sQL + - I-sL- 


= -sPL + Ois"^), (29) 


and therefore, using QP = 0, we find: 

f e^^-’^'^^PLe^^^QLxjds= [ PLQe'^^QLxj ds + 0{t^). (30) 

Jo Jo 


/o JO 

If P is a finite rank projection then 


PLe^Q^QLxj = 'y^{QLe^^^QLxj,hk)hk{x). 


(31) 


where, as before, one can write {QLe‘^^^QLxj,hk) as —{e'^^^QLxj.QLhk) when 
L is skew-symmetric. If the correlations {e'^‘^^QLxj,QLhk) and also the corre¬ 
lations {e^^QLxj, QLhk) are significant only over short times s, the approxima¬ 
tion provides an acceptable approximation without requiring the solution of 
the orthogonal dynamics equation (see uni for an application to the dimensional 
reduction of the Kuramoto-Sivashinsky equation and |2 for an application to 
molecular dynamics). 
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The limiting case of the short-memory approximation is when the correla¬ 
tions are delta functions. There is a large literature on solving equations m 
with the assumption of delta function memory; usually this is done without 
explicit mention, as if it were an obvious property of stochastic systems- an 
astonishing state of affairs nearly 40 years after Alder and Wainwright demon¬ 
strated the long memory in a typical physical system pQ. All the dynamic (i.e., 
time-dependent) renormalization group methods we can find depend on this as¬ 
sumption ESI, and this remark goes a long way towards explaining their relative 
lack of success in applications. We will no longer bother making detailed com¬ 
parisons with this dynamic renormalization literature; the point of view here is 
that reduction on the basis of equations m is the right kind of renormaliza¬ 
tion, and anything with added drastic assumptions must be justified by appeal 
to that right kind. 

Nevertheless, there are important circumstances where the very short mem¬ 
ory assumption can be justihed, in particular in problems with separation of 
time scales, where the components of the unresolved variables, vary on 

much faster scales than the resolved variables (see e.g. mm)- One can then 
set 

e^^^QLxj = AjWj{t), (32) 

where the prime denotes a derivative, the Wj (t) are independent unit Brownian 
motions, and the Aj constants that must be derived from some prior knowledge. 
Assume further that the projection P is well represented by the physicists’ “lin¬ 
ear” projection and that the density used to perform the projections is invariant. 
The memory term becomes — s), equations HI bH become stochastic or¬ 

dinary differential equations of the usual kind. As usual (see e.g. EHI), the 
corresponding probability densities can be found via Fokker-Planck formalisms 
(or Kolmogorov equations, in mathematicians’ language). Everything is easier. 
There is a big literature on these methods which we recoil from surveying. 

It is often the case that the quantities of interest are the components of 
and the corresponding projection P is in general poorly approximated 
by the “linear” projection. The formalism above readily extends to more general 
projections, with more terms in the basis chosen in the range of P (see e.g. nni), 
as long as one assumes that the temporal correlations of the new terms are fast 
decaying functions. Terms that have long correlation times violate the ansatz 
(P|l and can hamper rather than enhance accuracy (see e.g. mi). A way to 
pick the fast decaying terms in the projection of the memory kernel for problems 
that exhibit separation of time scales was presented in mi- We should note 
here that projections which include higher than linear terms are at the heart of 
mode-coupling theory (see e.g. mi), which has proved very effective in tackling 
problems in condensed matter physics. 

We examine now the validity of the ansatz for cases with slowly 

decaying memory. Write the memory term in the Mori-Zwanzig equation null 
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as 

Le^*-^)^e^Q^QLxjds 

e^^-’^'>^esQLQLQLxjds, 

where we have used the commutation of L and QL with e*'^ and re¬ 

spectively. At this point, make the approximation which eliminates the s 
dependence of both integrands and we have 

PLe‘^^^QLxjds = te^^PLQLxj. 

All that remains of the integration in time is the coefficient t. One can get rid of 
the noise term by premultiplying equations by a projection P, as in equation 
CT . and obtain a reduced non-autonomous set of differential equations. This 
approximation was named the t-model in (see m for an application to the 
dimensional reduction of a nonlinear Schrodinger equation). Other cases where 
non-Markovian models can be approximated by Markovian equations with time- 
dependent coefficients can be found in m- 

We proceed to examine the order of accuracy of this approximation. We 
have 



f e^*-^^^PLe^^^QLxjds= [ 
Jo Jo 


[ e^*-‘^^^PLe^^^QLxjds - te^^PLQLxj = 

Jo 

[ - e^^PL]QLxjds. 

Jo 

Adding and subtracting equal quantities we find 

g(t-s)Lp^gsQL ^ gtipp p gtLjg-sLppgsQL _ ppj^ 

and a Taylor series around s = 0 gives 

- PL = {I - sL + .. .)PL{I + sQL + ...)-PL = 0(s). (33) 

This implies 

[ e^^-‘^'>^PLe’^^^QLxjds = te^^PLQLxj + 0{t^). 

Jo 

The OiiJ) error estimate can be put into perspective by examining an alternate 
derivation of the t-model. If we expand the integrand of the memory term of 
the Mori-Zwanzig equation around s = 0 and retain only the leading term, we 
find 

[ e^^-^'^^PLe^^^QLxjds= [ [e^^PLQLxj + Ois)]ds 
Jo Jo 

= PLQLxj + 0{t^). 
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If we retain only the leading term, we do not keep any information about the 
time evolution of the integrand, which in turn means no information about the 
evolution of the resolved component and of the coupling to the orthogonal dy¬ 
namics (through the term [[LQe^^^QLxj,hk))- Such a drastic approximation 
is expected to be appropriate in cases where the memory term integrand is 
slowly decaying, so that information about its initial value is enough. 

As an example, consider again the Hald model whose Hamiltonian is 

= 2 + '/'i + '/'s + 1^4 + (34) 

The resulting equations of motion are: 

d(j)i 
dt 
d(j>2 
dt 
d(l>3 
dt 
d/pA 
dt 

Suppose one wants to solve only for p = (t/fi, ^2), with initial data x = {xi, X 2 )- 
Assume the initial data x^, X4 are sampled from a canonical density with temper¬ 
ature T = 1. A quick calculation yields E[xWxi,X 2 \ = 1/(1 + xl). the advance 
in time described by the multiplication by requires just the substitution 
X —>■ (p. If one commutes the nonlinear function evaluation and the conditional 
averaging, i.e., writes Pfip) = fiPp) ( a “mean-field approximation”), and 
writes furthemore $(t) = Pp = £’[(/|x] one finds Pe^^PLxi — $2, Pe^^PLx 2 = 
— $1(1 -I- 1/(1 -I- $1)); one can calculate Pe^'^LQLxj for j = 1, 2 and finally one 
finds: 


= (p2 

= —/>l(l + (P 3 ) 
= (pi 

= -(p3{l + (pl). 


$1 = $2 
dt 

d 

—$2 = -$1(1 
dt 


1 

1 -k 


) - 2t 


$ 2^2 

(l + $?)2- 


(35) 


The last term represents the damping due to the loss of predictive power 
of partial data; the coefficient of the last term increases in time and one may 
worry that this last term eventually overpowers the equations and leads to some 
odd behavior. This is not the case. Indeed, one can prove the following. If the 
system one starts from, equation m is Hamiltonian with Hamiltonian H, and 
if the initial data are sampled from an initial canonical density conditioned 
by partial data x, and if H is the renormalized Hamiltonian ( in the sense of 
Section mi, then {d/dt)H < 0, showing that the components of (p decay as they 
should. The proof requires a technical assumption ( that the Hamiltonian H 
can be written as the sum of a function of p and a function of q, a condition 
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commonly satisfied) and we omit it (see ^Hl)- The reduced system was 
solved numerically in m with gratifying results. 

The t-model is the zero-th order term in a Taylor expansion (around s = 0) 
of the integrand of the memory term in However, nothing prevents us 

from keeping more terms in this expansion. Let 

- s),s) = 

and expand K around s = 0, i.e. 

- s),s) = K{^{t),0) + s-^\s=o + + 0{s^)- 

In the case when P is the finite-rank projection and the density used to define 
the projection is invariant, the derivatives of iiT at s = 0 are equal-time (static) 
correlations. In mode-coupling theory, such expressions are known as sum rules. 
One can assume a functional form for the memory term integrand around s = 0, 
e.g. a Gaussian , and use the derivatives of iiT at s = 0 to estimate a, b 

(see m for more on sum rules and mode-coupling theory). 

8 Intermediate-range memory 

There are intermediate cases where the memory is sufficiently long-range for the 
short-memory approximation to break down, yet not so slowly decaying that 
the t-model can give accurate results. At present, it is not known how to deal 
effectively with such cases. In a series of papers mm we presented special 
cases and their solutions. In particular in m we presented a detailed analysis 
of the Hald system. We showed that the memory decays roughly at the same 
rate as the solution itself ( this is the general case in the absence of separation 
of scales). We expanded the various correlation functions at equilibrium (i.e., 
when there are no resolved variables) in Hermite polynomials, evaluated the 
coefficients in the expansions by Monte-Carlo once and for all, and then obtained 
a system of integro-differential approximations to equations which we then 
solved in various cases. This is a legitimate procedure which may be useful when 
the same system of equations has to be solved repeatedly. These calculations do 
exhibit a salient feature of model reduction in time-dependent problems, which 
is that its set-up costs are often very high. The future remedy, if there is one, 
will surely lie in a deeper understanding of dynamical renormalization and in 
particular of the way memory depends on scale. 
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